9 research outputs found

    A Suffix Based Morphological Analysis of Assamese Word Formation

    Get PDF
    Languages have several important features such as part-of-speech, tenses, prefixes and suffixes etc. which play major roles to solve the purpose of the language. In Assamese language suffixation is a very sensitive and unavoidable factor in the formation of Assamese words. Suffixes are letters or group of letters placed right after the nouns, pronouns, adjectives, verbs and adverbs etc to intensify the meaning contextually of the newly formed words due to suffixation. Because of the inflectional nature of suffixation, it often creates new words differing in part-of-speech and meaning from the original words, it is attached with. Hence suffixation is morphodyanmic process through which new words are generated from old words changing their forms, function and meaning thus increasing the lexical inventory of Assamese language. This particular study can create a theoretical base about the nature of lexical generativity of suffixes in the formation of Assamese words

    Reordering of Source Side for a Factored English to Manipuri SMT System

    Get PDF
    Similar languages with massive parallel corpora are readily implemented by large-scale systems using either Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Translations involving low-resource language pairs with linguistic divergence have always been a challenge. We consider one such pair, English-Manipuri, which shows linguistic divergence and belongs to the low resource category. For such language pairs, SMT gets better acclamation than NMT. However, SMTā€™s more prominent phrase- based model uses groupings of surface word forms treated as phrases for translation. Therefore, without any linguistic knowledge, it fails to learn a proper mapping between the source and target language symbols. Our model adopts a factored model of SMT (FSMT3*) with a part-of-speech (POS) tag as a factor to incorporate linguistic information about the languages followed by hand-coded reordering. The reordering of source sentences makes them similar to the target language allowing better mapping between source and target symbols. The reordering also converts long-distance reordering problems to monotone reordering that SMT models can better handle, thereby reducing the load during decoding time. Additionally, we discover that adding a POS feature data enhances the systemā€™s precision. Experimental results using automatic evaluation metrics show that our model improved over phrase-based and other factored models using the lexicalised Moses reordering options. Our FSMT3* model shows an increase in the automatic scores of translation result over the factored model with lexicalised phrase reordering (FSMT2) by an amount of 11.05% (Bilingual Evaluation Understudy), 5.46% (F1), 9.35% (Precision), and 2.56% (Recall), respectively

    An empirical study on English-Mizo Statistical Machine Translation with Bible Corpus

    Get PDF
    Machine Translation (MT) is the process of automatically converting the text or speech in one natural language to another language with the help of a machine. This work presents a Bidirectional Statistical Machine Translation (SMT) system of an extremely low resource language pair Mizo-English, built in a low resource setting. A total of 30800 sentences are collected from the English Bible dataset and manually translated to Mizo by a native linguistic expert to generate the English-Mizo parallel dataset. After subjecting to various pre-processing steps, the parallel dataset is used to build our MT system using MOSES tools. Our framework uses different tools, such as GIZA++ for creating the Translation Model (TM) and IRSTLM to determine the probability of the target model. The quality of our MT system is evaluated using two automatic evaluation metrics: BLEU and METEOR. Our MT systems are also manually evaluated using two parameters: adequacy and fluency

    Building Manipuri-English machine readable dictionary by implementing ontology

    Get PDF
    Abstract: Any system that hopes to process natural languages as people do must have information about words, their meaning, concept, relative words in another language and meaningful sentences are composed of meaningful words. Traditionally information is provided through electronic dictionaries. But these dictionary entries evolved for the convenience of human readers, not for machines. So, machine readable electronic dictionary becomes the central resources for Natural Language applications. Dictionaries and other lexical resources are not yet widely available in electronic form for Manipuri language. And there is no Manipuri-English machine readable dictionary that can provide both of lexical resources and conceptual information. This paper describes the process for developing Manipuri-English dictionary by implementing ontology. This implementation should provide a more effective combination of traditional Manipuri-English bilingual lexicographic information and their conceptual information
    corecore